Dictionaries, (Sets, Tuples)

Source: This materials is adapted from Python for Biologists and Learn Python 3 in Y Minutes.

You can read more about dictionaries and tuples in the Python for Everyone book.

Storing paired data

Suppose we want to count the number of As in a DNA sequence. Carrying out the calculation is quite straightforward – in fact it’s one of the first things we did in talking about strings.


In [15]:
dna = "ATCGATCGATCGTACGCTGA"
a_count = dna.count("A")

How will our code change if we want to generate a complete list of base counts for the sequence? We’ll add a new variable for each base:


In [16]:
dna = "ATCGATCGATCGTACGCTGA"
a_count = dna.count("A")
t_count = dna.count("T")
g_count = dna.count("G")
c_count = dna.count("C")

and now our code is starting to look rather repetitive. It’s not too bad for the four individual bases, but what if we want to generate counts for the 16 dinucleotides:


In [17]:
dna = "ATCGATCGATCGTACGCTGA"
aa_count = dna.count("AA")
at_count = dna.count("AT")
ag_count = dna.count("AG")
# ...etc...

or the 64 trinucleotides:


In [18]:
dna = "ATCGATCGATCGTACGCTGA"
aaa_count = dna.count("AAA")
aat_count = dna.count("AAT")
aag_count = dna.count("AAG")
# ...etc...

For trinucleotides and longer, the situation is particularly bad. The DNA sequence is 20 bases long, so it only contains 18 overlapping trinucleotides in total. This means that we’ll end up with 64 different variables, at least 46 of which will hold the value zero.

One possible way round this is to store the values in a list. If we use three nested loops, we can generate all possible trinucleotides, calculate the count for each one, and store all the counts in a list:


In [19]:
dna = "AATGATCGATCGTACGCTGA"
all_counts = []
for base1 in ['A', 'T', 'G', 'C']:
    for base2 in ['A', 'T', 'G', 'C']:
            for base3 in ['A', 'T', 'G', 'C']:
                trinucleotide = base1 + base2 + base3
                count = dna.count(trinucleotide)
                print("count is " + str(count) + " for " + trinucleotide)
                all_counts.append(count)

print(all_counts)


count is 0 for AAA
count is 1 for AAT
count is 0 for AAG
count is 0 for AAC
count is 0 for ATA
count is 0 for ATT
count is 1 for ATG
count is 2 for ATC
count is 0 for AGA
count is 0 for AGT
count is 0 for AGG
count is 0 for AGC
count is 0 for ACA
count is 0 for ACT
count is 1 for ACG
count is 0 for ACC
count is 0 for TAA
count is 0 for TAT
count is 0 for TAG
count is 1 for TAC
count is 0 for TTA
count is 0 for TTT
count is 0 for TTG
count is 0 for TTC
count is 2 for TGA
count is 0 for TGT
count is 0 for TGG
count is 0 for TGC
count is 0 for TCA
count is 0 for TCT
count is 2 for TCG
count is 0 for TCC
count is 0 for GAA
count is 2 for GAT
count is 0 for GAG
count is 0 for GAC
count is 1 for GTA
count is 0 for GTT
count is 0 for GTG
count is 0 for GTC
count is 0 for GGA
count is 0 for GGT
count is 0 for GGG
count is 0 for GGC
count is 0 for GCA
count is 1 for GCT
count is 0 for GCG
count is 0 for GCC
count is 0 for CAA
count is 0 for CAT
count is 0 for CAG
count is 0 for CAC
count is 0 for CTA
count is 0 for CTT
count is 1 for CTG
count is 0 for CTC
count is 1 for CGA
count is 1 for CGT
count is 0 for CGG
count is 1 for CGC
count is 0 for CCA
count is 0 for CCT
count is 0 for CCG
count is 0 for CCC
[0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0]

Although the code is above is quite compact, and doesn’t require huge numbers of variables, the output shows two problems with this approach:

Firstly, the data are still incredibly sparse – the vast majority of the counts are zero. Secondly, the counts themselves are now disconnected from the trinucleotides. If we want to look up the count for a single trinucleotide – for example, TGA – we first have to figure out that TGA was the 25th trinucleotide generated by our loops. Only then can we get the element at the correct index:


In [20]:
print("count for TGA is " + str(all_counts[24]))


count for TGA is 2

We can try various tricks to get round this problem. What if we generated two lists – one of counts, and one of the trinucleotides themselves?


In [21]:
dna = "AATGATCGATCGTACGCTGA"
all_trinucleotides = []
all_counts = []
for base1 in ['A', 'T', 'G', 'C']:
    for base2 in ['A', 'T', 'G', 'C']:
            for base3 in ['A', 'T', 'G', 'C']:
                trinucleotide = base1 + base2 + base3
                count = dna.count(trinucleotide)
                all_trinucleotides.append(trinucleotide)
                all_counts.append(count)
print(all_counts)
print(all_trinucleotides)


[0, 1, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0]
['AAA', 'AAT', 'AAG', 'AAC', 'ATA', 'ATT', 'ATG', 'ATC', 'AGA', 'AGT', 'AGG', 'AGC', 'ACA', 'ACT', 'ACG', 'ACC', 'TAA', 'TAT', 'TAG', 'TAC', 'TTA', 'TTT', 'TTG', 'TTC', 'TGA', 'TGT', 'TGG', 'TGC', 'TCA', 'TCT', 'TCG', 'TCC', 'GAA', 'GAT', 'GAG', 'GAC', 'GTA', 'GTT', 'GTG', 'GTC', 'GGA', 'GGT', 'GGG', 'GGC', 'GCA', 'GCT', 'GCG', 'GCC', 'CAA', 'CAT', 'CAG', 'CAC', 'CTA', 'CTT', 'CTG', 'CTC', 'CGA', 'CGT', 'CGG', 'CGC', 'CCA', 'CCT', 'CCG', 'CCC']

Now we have two lists of the same length, with a one-to-one correspondence between the elements:

This allows us to look up the count for a given trinucleotide in a slightly more appealing way – we can look up the index of the trinucleotide in the all_trinucleotides list, then get the count at the same index in the all_counts list:


In [22]:
i = all_trinucleotides.index('TGA')
c = all_counts[i]
print('count for TGA is ' + str(c))


count for TGA is 2

This is a little bit nicer, but still has major drawbacks. We’re still storing all those zeros, and now we have two lists to keep track of. We need to be incredibly careful when manipulating either of the two lists to make sure that they stay perfectly synchronized – if we make any change to one list but not the other, then there will no longer be a one-to-one correspondence between elements and we’ll get the wrong answer when we try to look up a count.

This approach is also slow1 . To find the index of a given trinucleotide in the all_trinucleotides list, Python has to look at each element one at a time until it finds the one we’re looking for. This means that as the size of the list grows2 , the time taken to look up the count for a given element will grow alongside it.

If we take a step back and think about the problem in more general terms, what we need is a way of storing pairs of data (in this case, trinucleotides and their counts) in a way that allows us to efficiently look up the count for any given trinucleotide. This problem of storing paired data is incredibly common in programming. We might want to store:

protein sequence names and their sequences DNA restriction enzyme names and their motifs codons and their associated amino acid residues colleagues’ names and their email addresses sample names and their co-ordinates words and their definitions

All these are examples of what we call key-value pairs. In each case we have pairs of keys and values:

Key Value trinucleotide count name protein sequence name restriction enzyme motif codon amino acid residue name email address sample coordinates word definition

The last example in this table – words and their definitions – is an interesting one because we have a tool in the physical world for storing this type of data – a dictionary. Python’s tool for solving this type of problem is also called a dictionary (usually abbreviated to dict) and in this section we’ll see how to create and use them.

Creating a dictionary

The syntax for creating a dictionary is similar to that for creating a list, but we use curly brackets rather than square ones. Each pair of data, consisting of a key and a value, is called an item. When storing items in a dictionary, we separate them with commas. Within an individual item, we separate the key and the value with a colon. Here’s a bit of code that creates a dictionary of restriction enzymes (using data from the previous section) with three items:


In [23]:
enzymes = { 'EcoRI':r'GAATTC', 'AvaII':r'GG(A|T)CC', 'BisI':'GC[ATGC]GC' }

In this case, the keys and values are both strings3 . Splitting the dictionary definition over several lines makes it easier to read:


In [24]:
enzymes = {
    'EcoRI' : r'GAATTC',
    'AvaII' : r'GG(A|T)CC',
    'BisI'  : r'GC[ATGC]GC'
}

but doesn’t affect the code at all. To retrieve a bit of data from the dictionary – i.e. to look up the motif for a particular enzyme – we write the name of the dictionary, followed by the key in square brackets:


In [25]:
print(enzymes['BisI'])


GC[ATGC]GC

The code looks very similar to using a list, but instead of giving the index of the element we want, we’re giving the key for the value that we want to retrieve.

Dictionaries are a very useful way to store data, but they come with some restrictions. The only types of data we are allowed to use as keys are strings and numbers4 , so we can’t, for example, create a dictionary where the keys are file objects. Values can be whatever type of data we like. Also, keys must be unique – we can’t store multiple values for the same key. You might think that this makes dicts less useful, but there are ways round the problem of storing multiple values – we won’t need them for the examples in this chapter, but the chapter on complex data structures in Advanced Python for Biologists gives details.

In real-life programs, it’s relatively rare that we’ll want to create a dictionary all in one go like in the example above. More often, we’ll want to create an empty dictionary, then add key/value pairs to it (just as we often create an empty list and then add elements to it).

To create an empty dictionary we simply write a pair of curly brackets on their own, and to add elements, we use the square-brackets notation on the left-hand side of an assignment. Here’s a bit of code that stores the restriction enzyme data one item at a time:


In [26]:
enzymes = {}
enzymes['EcoRI'] = r'GAATTC'
enzymes['AvaII'] =  r'GG(A|T)CC'
enzymes['BisI'] =  r'GC[ATGC]GC'

We can delete a key from a dictionary using the pop method. pop actually returns the value and deletes the key at the same time:


In [27]:
enzymes = {
    'EcoRI' : r'GAATTC',
    'AvaII' : r'GG(A|T)CC',
    'BisI'  : r'GC[ATGC]GC'
}
# remove the EcoRI enzyme from the dict
enzymes.pop('EcoRI')


Out[27]:
'GAATTC'

Let’s take another look at the trinucleotide count example from the start of the section. Here’s how we store the trinucleotides and their counts in a dictionary:


In [28]:
dna = "AATGATCGATCGTACGCTGA"
counts = {}
for base1 in ['A', 'T', 'G', 'C']:
    for base2 in ['A', 'T', 'G', 'C']:
        for base3 in ['A', 'T', 'G', 'C']:
            trinucleotide = base1 + base2 + base3
            count = dna.count(trinucleotide)
            counts[trinucleotide] = count
 
print(counts)


{'GCA': 0, 'TTA': 0, 'GCG': 0, 'GGT': 0, 'ACC': 0, 'GCC': 0, 'TAC': 1, 'CAT': 0, 'GAT': 2, 'GGA': 0, 'TGA': 2, 'TGG': 0, 'ATA': 0, 'TAG': 0, 'AGT': 0, 'AGA': 0, 'CAC': 0, 'CTA': 0, 'GAG': 0, 'GTC': 0, 'ACG': 1, 'CCT': 0, 'CGT': 1, 'TCT': 0, 'ACA': 0, 'GGG': 0, 'GGC': 0, 'CAG': 0, 'AGC': 0, 'CAA': 0, 'TTC': 0, 'ATG': 1, 'GAC': 0, 'ATT': 0, 'TCC': 0, 'AAA': 0, 'GTT': 0, 'CGA': 1, 'CTT': 0, 'GTG': 0, 'CTC': 0, 'GTA': 1, 'AGG': 0, 'TAA': 0, 'CTG': 1, 'TGC': 0, 'TAT': 0, 'CCC': 0, 'CGG': 0, 'ACT': 0, 'CCG': 0, 'TTG': 0, 'CCA': 0, 'CGC': 1, 'AAC': 0, 'AAT': 1, 'GCT': 1, 'TCG': 2, 'TGT': 0, 'GAA': 0, 'TCA': 0, 'TTT': 0, 'ATC': 2, 'AAG': 0}

We can see from the output that the trinucleotides and their counts are stored together in one variable:

{'ACC': 0, 'ATG': 1, 'AAG': 0, 'AAA': 0, 'ATC': 2, 'AAC': 0, 'ATA': 0, 'AGG': 0, 'CCT': 0, 'CTC': 0, 'AGC': 0, 'ACA': 0, 'AGA': 0, 'CAT': 0, 'AAT': 1, 'ATT': 0, 'CTG': 1, 'CTA': 0, 'ACT': 0, 'CAC': 0, 'ACG': 1, 'CAA': 0, 'AGT': 0, 'CAG': 0, 'CCG': 0, 'CCC': 0, 'CTT': 0, 'TAT': 0, 'GGT': 0, 'TGT': 0, 'CGA': 1, 'CCA': 0, 'TCT': 0, 'GAT': 2, 'CGG': 0, 'TTT': 0, 'TGC': 0, 'GGG': 0, 'TAG': 0, 'GGA': 0, 'TAA': 0, 'GGC': 0, 'TAC': 1, 'TTC': 0, 'TCG': 2, 'TTA': 0, 'TTG': 0, 'TCC': 0, 'GAA': 0, 'TGG': 0, 'GCA': 0, 'GTA': 1, 'GCC': 0, 'GTC': 0, 'GCG': 0, 'GTG': 0, 'GAG': 0, 'GTT': 0, 'GCT': 1, 'TGA': 2, 'GAC': 0, 'CGT': 1, 'TCA': 0, 'CGC': 1}

We still have a lot of repetitive counts of zero, but looking up the count for a particular trinucleotide is now very straightforward:


In [29]:
print(counts['TGA'])


2

We no longer have to worry about either “memorizing” the order of the counts or maintaining two separate lists.

Let’s now see if we can find a way of avoiding storing all those zero counts. We can add an if statement that ensures that we only store a count if it’s greater than zero:


In [30]:
dna = "AATGATCGATCGTACGCTGA"
counts = {}
for base1 in ['A', 'T', 'G', 'C']:
    for base2 in ['A', 'T', 'G', 'C']:
        for base3 in ['A', 'T', 'G', 'C']:
            trinucleotide = base1 + base2 + base3
            count = dna.count(trinucleotide)
            if count > 0:
                counts[trinucleotide] = count
 
print(counts)


{'ATG': 1, 'ACG': 1, 'CGC': 1, 'CTG': 1, 'TAC': 1, 'AAT': 1, 'GTA': 1, 'GCT': 1, 'CGT': 1, 'TCG': 2, 'CGA': 1, 'GAT': 2, 'TGA': 2, 'ATC': 2}

When we look at the output from the above code, we can see that the amount of data we’re storing is much smaller – just the counts for the trinucleotides that are greater than zero:

{'ATG': 1, 'ACG': 1, 'ATC': 2, 'GTA': 1, 'CTG': 1, 'CGC': 1, 'GAT': 2, 'CGA': 1, 'AAT': 1, 'TGA': 2, 'GCT': 1, 'TAC': 1, 'TCG': 2, 'CGT': 1}

Now we have a new problem to deal with. Looking up the count for a given trinucleotide works fine when the count is positive:

In [31]:
print(counts['TGA'])


2

But when the count is zero, the trinucleotide doesn’t appear as a key in the dictionary:


In [32]:
print(counts['AAA'])


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-af5ce5c23f08> in <module>()
----> 1 print(counts['AAA'])

KeyError: 'AAA'

so we will get a KeyError when we try to look it up:

There are two possible ways to fix this. We can check for the existence of a key in a dictionary (just like we can check for the existence of an element in a list), and only try to retrieve it once we know it exists:


In [33]:
if 'AAA' in counts:
    print(counts('AAA'))

Alternatively, we can use the dictionary’s get method. get usually works just like using square brackets: the following two lines do exactly the same thing:


In [34]:
print(counts['TGA'])
print(counts.get('TGA'))


2
2

The thing that makes get really useful, however, is that it can take an optional second argument, which is the default value to be returned if the key isn’t present in the dictionary. In this case, we know that if a given trinucleotide doesn’t appear in the dictionary then its count is zero, so we can give zero as the default value and use get to print out the count for any trinucleotide:


In [35]:
print("count for TGA is " + str(counts.get('TGA', 0)))
print("count for AAA is " + str(counts.get('AAA', 0)))
print("count for GTA is " + str(counts.get('GTA', 0)))
print("count for TTT is " + str(counts.get('TTT', 0)))


count for TGA is 2
count for AAA is 0
count for GTA is 1
count for TTT is 0

As we can see from the output, we now don’t have to worry about whether or not each trinucleotide appears in the dictionary – get takes care of everything and returns zero when appropriate:

Iterating over a dictionary

What if, instead of looking up a single item from a dictionary, we want to do something for all items? For example, imagine that we wanted to take our counts dictionary variable from the code above and print out all trinucleotides where the count was 2. One way to do it would be to use our three nested loops again to generate all possible trinucleotides, then look up the count for each one and decide whether or not to print it:


In [37]:
for base1 in ['A', 'T', 'G', 'C']:
    for base2 in ['A', 'T', 'G', 'C']:
        for base3 in ['A', 'T', 'G', 'C']:
            trinucleotide = base1 + base2 + base3
            if counts.get(trinucleotide, 0) == 2:
                print(trinucleotide)


ATC
TGA
TCG
GAT

As we can see from the output, this works perfectly well:

But it seems inefficient to go through the whole process of generating all possible trinucleotides again, when the information we want – the list of trinucleotides – is already in the dictionary. A better approach would be to read the list of keys directly from the dictionary, which is what the keys method does.

Iterating over keys

When used on a dictionary, the keys method returns a list of all the keys in the dictionary:


In [38]:
print(counts.keys())


dict_keys(['ATG', 'ACG', 'CGC', 'CTG', 'TAC', 'AAT', 'GTA', 'GCT', 'CGT', 'TCG', 'CGA', 'GAT', 'TGA', 'ATC'])

Looking at the output5 confirms that this is the list of trinucleotides we want to consider (remember that we’re looking for trinucleotides with a count of two, so we don’t need to consider ones that aren’t in the dictionary as we already know that they have a count of zero):

['ATG', 'ACG', 'ATC', 'GTA', 'CTG', 'CGC', 'GAT', 'CGA', 'AAT', 'TGA', 'GCT', 'TAC', 'TCG', 'CGT']

Using keys, our code for printing out all the trinucleotides that appear twice in the DNA sequence becomes a lot more concise:


In [39]:
for trinucleotide in counts.keys():
    if counts.get(trinucleotide) == 2:
        print(trinucleotide)


TCG
GAT
TGA
ATC

This version prints exactly the same set of trinucleotides as the more verbose method:

Before we move on, take a moment to compare the output immediately above this paragraph with the output from the three-loop version from earlier in this section. You’ll notice that while the set of trinucleotides is the same, the order in which they appear is different. This illustrates an important point about dictionaries – they are inherently unordered. That means that when we use the keys method to iterate over a dictionary, we can’t rely on processing the items in the same order that we added them. This is in contrast to lists, which always maintain the same order when looping. If we want to control the order in which keys are printed we can use the sorted method to sort the list before processing it:

for trinucleotide in sorted(counts.keys()): if counts.get(trinucleotide) == 2: print(trinucleotide)

Iterating over items

In the example code above, the first thing we need to do inside the loop is to look up the value for the current key. This is a very common pattern when iterating over dictionaries – so common, in fact, that Python has a special shorthand for it. Instead of doing this:


In [40]:
for key in my_dict.keys():
    value = my_dict.get(key)
    # do something with key and value


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-40-d12f3f719baa> in <module>()
----> 1 for key in my_dict.keys():
      2     value = my_dict.get(key)
      3     # do something with key and value

NameError: name 'my_dict' is not defined

We can use the items method to iterate over pairs of data, rather than just keys:


In [ ]:
for key, value in my_dict.items():
    # do something with key and value

The items method does something slightly different from all the other methods we’ve seen so far in this book; rather than returning a single value, or a list of values, it returns a list of pairs of values. That’s why we have to give two variable names at the start of the loop. Here’s how we can use the items method to process our dictionary of trinucleotide counts just like before:


In [ ]:
for trinucleotide, count in counts.items():
    if count == 2:
        print(trinucleotide)

This method is generally preferred for iterating over items in a dictionary, as it makes the intention of the code very clear. Many of the problems that we solve by iterating over dicts can also be solved using comprehensions – there’s a whole chapter devoted to comprehensions in Advanced Python for Biologists, so take a look.


In [41]:
# Tuples are like lists but are immutable.
tup = (1, 2, 3)
tup[0]      # => 1
tup[0] = 3  # Raises a TypeError


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-41-afbd86b19b60> in <module>()
      2 tup = (1, 2, 3)
      3 tup[0]      # => 1
----> 4 tup[0] = 3  # Raises a TypeError

TypeError: 'tuple' object does not support item assignment

In [43]:
# Note that a tuple of length one has to have a comma after the last element but
# tuples of other lengths, even zero, do not.
type((1))   # => <class 'int'>
type((1,))  # => <class 'tuple'>
type(())    # => <class 'tuple'>


Out[43]:
tuple

In [42]:
# You can do most of the list operations on tuples too
len(tup)         # => 3
tup + (4, 5, 6)  # => (1, 2, 3, 4, 5, 6)
tup[:2]          # => (1, 2)
2 in tup         # => True


Out[42]:
True

In [44]:
# You can unpack tuples (or lists) into variables
a, b, c = (1, 2, 3)  # a is now 1, b is now 2 and c is now 3
# You can also do extended unpacking
a, *b, c = (1, 2, 3, 4)  # a is now 1, b is now [2, 3] and c is now 4
# Tuples are created by default if you leave out the parentheses
d, e, f = 4, 5, 6
# Now look how easy it is to swap two values
e, d = d, e  # d is now 5 and e is now 4

In [45]:
# Dictionaries store mappings
empty_dict = {}
# Here is a prefilled dictionary
filled_dict = {"one": 1, "two": 2, "three": 3}

In [46]:
# Note keys for dictionaries have to be immutable types. This is to ensure that
# the key can be converted to a constant hash value for quick look-ups.
# Immutable types include ints, floats, strings, tuples.
invalid_dict = {[1,2,3]: "123"}  # => Raises a TypeError: unhashable type: 'list'
valid_dict = {(1,2,3):[1,2,3]}   # Values can be of any type, however.


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-46-a95b8f5e0861> in <module>()
      2 # the key can be converted to a constant hash value for quick look-ups.
      3 # Immutable types include ints, floats, strings, tuples.
----> 4 invalid_dict = {[1,2,3]: "123"}  # => Raises a TypeError: unhashable type: 'list'
      5 valid_dict = {(1,2,3):[1,2,3]}   # Values can be of any type, however.

TypeError: unhashable type: 'list'

In [47]:
# Look up values with []
filled_dict["one"]  # => 1


Out[47]:
1

In [48]:
# Get all keys as an iterable with "keys()". We need to wrap the call in list()
# to turn it into a list. We'll talk about those later.  Note - Dictionary key
# ordering is not guaranteed. Your results might not match this exactly.
list(filled_dict.keys())  # => ["three", "two", "one"]


Out[48]:
['one', 'three', 'two']

In [49]:
# Get all values as an iterable with "values()". Once again we need to wrap it
# in list() to get it out of the iterable. Note - Same as above regarding key
# ordering.
list(filled_dict.values())  # => [3, 2, 1]


Out[49]:
[1, 3, 2]

In [50]:
# Check for existence of keys in a dictionary with "in"
"one" in filled_dict  # => True
1 in filled_dict      # => False


Out[50]:
False

In [51]:
# Looking up a non-existing key is a KeyError
filled_dict["four"]  # KeyError


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-51-d28b0d4988ab> in <module>()
      1 # Looking up a non-existing key is a KeyError
----> 2 filled_dict["four"]  # KeyError

KeyError: 'four'

In [52]:
# Use "get()" method to avoid the KeyError
filled_dict.get("one")      # => 1
filled_dict.get("four")     # => None
# The get method supports a default argument when the value is missing
filled_dict.get("one", 4)   # => 1
filled_dict.get("four", 4)  # => 4


Out[52]:
4

In [53]:
# "setdefault()" inserts into a dictionary only if the given key isn't present
filled_dict.setdefault("five", 5)  # filled_dict["five"] is set to 5
filled_dict.setdefault("five", 6)  # filled_dict["five"] is still 5


Out[53]:
5

In [54]:
# Adding to a dictionary
filled_dict.update({"four":4})  # => {"one": 1, "two": 2, "three": 3, "four": 4}
#filled_dict["four"] = 4        #another way to add to dict

In [55]:
# Remove keys from a dictionary with del
del filled_dict["one"]  # Removes the key "one" from filled dict

In [56]:
# Sets store ... well sets
empty_set = set()
# Initialize a set with a bunch of values. Yeah, it looks a bit like a dict. Sorry.
some_set = {1, 1, 2, 2, 3, 4}  # some_set is now {1, 2, 3, 4}

In [57]:
# Similar to keys of a dictionary, elements of a set have to be immutable.
invalid_set = {[1], 1}  # => Raises a TypeError: unhashable type: 'list'
valid_set = {(1,), 1}


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-20a162f7fc40> in <module>()
      1 # Similar to keys of a dictionary, elements of a set have to be immutable.
----> 2 invalid_set = {[1], 1}  # => Raises a TypeError: unhashable type: 'list'
      3 valid_set = {(1,), 1}

TypeError: unhashable type: 'list'

In [58]:
# Can set new variables to a set
filled_set = some_set

In [59]:
# Add one more item to the set
filled_set.add(5)  # filled_set is now {1, 2, 3, 4, 5}

In [60]:
# Do set intersection with &
other_set = {3, 4, 5, 6}
filled_set & other_set  # => {3, 4, 5}


Out[60]:
{3, 4, 5}

In [61]:
# Do set union with |
filled_set | other_set  # => {1, 2, 3, 4, 5, 6}


Out[61]:
{1, 2, 3, 4, 5, 6}

In [62]:
# Do set difference with -
{1, 2, 3, 4} - {2, 3, 5}  # => {1, 4}


Out[62]:
{1, 4}

In [63]:
# Do set symmetric difference with ^
{1, 2, 3, 4} ^ {2, 3, 5}  # => {1, 4, 5}


Out[63]:
{1, 4, 5}

In [64]:
# Check if set on the left is a superset of set on the right
{1, 2} >= {1, 2, 3} # => False


Out[64]:
False

In [65]:
# Check if set on the left is a subset of set on the right
{1, 2} <= {1, 2, 3} # => True


Out[65]:
True

In [66]:
# Check for existence in a set with in
2 in filled_set   # => True
10 in filled_set  # => False


Out[66]:
False

In [ ]: